Configurable character variant unification

ABSTRACT

A system, and computer program product for configurable character variant unification are provided in the illustrative embodiments. A determination is made that a unification profile is applicable to a circumstance in which a character variant has been selected. The character variant is a variation of a character in a set of variations of the character such that each variation of the character in the set is represented by a unique Unicode code point. A unification repository is identified according to the profile. A determination is made whether the character variant satisfies a unification rule. Responsive to the character variant not satisfying the unification rule, a different variation of the character is selected from the unification repository, the different variation forming a replacement character variant. The replacement character variant is used in place of the character variant.

TECHNICAL FIELD

The present invention relates generally to a method, system, andcomputer program product for providing consistent computer input inmultiple languages. More particularly, the present invention relates toa method, system, and computer program product for configurableunification of character variants received from data input.

BACKGROUND

There are alphabet and non-alphabet languages in the world. For example,Chinese, Japanese and Korean borrowed alphabetic elements to representtheir own phonetic symbols or strokes.

A computer keyboard is a common device for providing a computer input. Akeyboard is language-specific such that the alphabet or non-alphabetkeys available on the keyboard can be pressed to directly input onlythose characters or symbols in the keyboard's language that are assignedto those keys. For inputting other characters or symbols in thelanguage, a user may need to press a combination of keys on the keyboardto invoke a specific input method application for the language.

Many languages have sets of characters or symbols (e.g., characteralphabet in English, or phonetic or stroke alphabet in other languages)that are too large to accommodate on a keyboard. Many languages needother ways of mapping the keyboard keys to the characters or symbols inthe language's set of characters or symbols. Using the keyboard keysaccording to the mapping produces the mapped characters or symbols inthe language. Furthermore, the phonetic or stroke alphabets of manylanguages do not use characters to form words in the manner of theEnglish language, but have a single character or collection ofcharacters that represent words. Thus, providing computer input in manylanguages is not as simple as pressing the letter-keys on the keyboardbut an indirect process of pressing a combination of keys to generatecharacters not available as keys on the keyboard.

Unicode is a method of coding characters of multiple languages. AUnicode table comprises unique codes called code points assigned tocharacters of one or more languages. A code point comprises analphanumeric representation that can be generated on commonly usedkeyboard configurations, such as an English language QWERTY keyboard.

To enter a code point, the user generally supplies an indication thatthe alphanumeric string following the indication is a Unicode code pointas is to be translated using a Unicode table to generate a character.For example, using a QWERTY keyboard, the user presses the ALT key,keeps the ALT key depressed while entering the code point, and releasesthe ALT key when the code point entry is complete.

An application called a Unicode input method application (hereinafter,“input method”, or “UIM”) intercepts the Unicode code point that theuser enters. A Unicode editor is an example UIM. The UIM looks up aUnicode table to find the character that matches the code point that theuser entered. The UIM supplies the character to a target application towhich the user is supplying the input.

Different sections in a Unicode table comprise different unique sets ofunique code points to represent different sets of characters indifferent languages. In other words, a code point in all of Unicode isunique to a specific character in a specific language.

SUMMARY

The illustrative embodiments provide a method, system, and computerprogram product for configurable character variant unification. Anembodiment includes a method for configurable character variantunification. The embodiment determines that a unification profile isapplicable to a circumstance in which a character variant has beenselected, wherein the character variant is a variation of a character ina set of variations of the character such that each variation of thecharacter in the set is represented by a unique Unicode code point. Theembodiment identifies a unification repository according to the profile.The embodiment determines whether the character variant satisfies aunification rule. The embodiment selects, responsive to the charactervariant not satisfying the unification rule, a different variation ofthe character from the unification repository, the different variationforming a replacement character variant. The embodiment uses thereplacement character variant in place of the character variant.

Another embodiment includes a computer usable program product comprisinga computer readable storage device including computer usable code forconfigurable character variant unification. The embodiment furtherincludes computer usable code for determining that a unification profileis applicable to a circumstance in which a character variant has beenselected, wherein the character variant is a variation of a character ina set of variations of the character such that each variation of thecharacter in the set is represented by a unique Unicode code point. Theembodiment further includes computer usable code for identifying aunification repository according to the profile. The embodiment furtherincludes computer usable code for determining whether the charactervariant satisfies a unification rule. The embodiment further includescomputer usable code for selecting, responsive to the character variantnot satisfying the unification rule, a different variation of thecharacter from the unification repository, the different variationforming a replacement character variant. The embodiment further includescomputer usable code for using the replacement character variant inplace of the character variant.

Another embodiment includes a data processing system for configurablecharacter variant unification. The embodiment further includes a storagedevice including a storage medium, wherein the storage device storescomputer usable program code. The embodiment further includes aprocessor, wherein the processor executes the computer usable programcode. The embodiment further includes computer usable code fordetermining that a unification profile is applicable to a circumstancein which a character variant has been selected, wherein the charactervariant is a variation of a character in a set of variations of thecharacter such that each variation of the character in the set isrepresented by a unique Unicode code point. The embodiment furtherincludes computer usable code for identifying a unification repositoryaccording to the profile. The embodiment further includes computerusable code for determining whether the character variant satisfies aunification rule. The embodiment further includes computer usable codefor selecting, responsive to the character variant not satisfying theunification rule, a different variation of the character from theunification repository, the different variation forming a replacementcharacter variant. The embodiment further includes computer usable codefor using the replacement character variant in place of the charactervariant.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

The novel features believed characteristic of the invention are setforth in the appended claims. The invention itself, however, as well asa preferred mode of use, further objectives and advantages thereof, willbest be understood by reference to the following detailed description ofthe illustrative embodiments when read in conjunction with theaccompanying drawings, wherein:

FIG. 1 depicts a block diagram of a network of data processing systemsin which illustrative embodiments may be implemented;

FIG. 2 depicts a block diagram of a data processing system in whichillustrative embodiments may be implemented;

FIG. 3 depicts a table of example resemblance variants of an examplecharacter that can be configurably unified in accordance with anillustrative embodiment;

FIG. 4 depicts a block diagram of a configuration for configurablecharacter variant unification in accordance with an illustrativeembodiment;

FIG. 5 depicts a flowchart of an example process for configurablecharacter variant unification in accordance with an illustrativeembodiment; and

FIG. 6 depicts a flowchart of an example process for configuringcharacter variant unification in accordance with an illustrativeembodiment.

DETAILED DESCRIPTION

A character variant is a variation of a character within a language oracross different languages. For example, a first type of charactervariants involves different characters in a given language or acrossdifferent languages, where the different characters look different butare pronounced in a similar manner, convey similar meanings, or both.This type of variants is referred to hereinafter as ‘distinct variants’.For example, in the Chinese language, this type of character variantscan be found between simplified Chinese and traditional Chinese. Thesimplified character is visually different from the traditionalcharacter, but the simplified and the traditional characters arepronounced approximately the same and convey approximately the samemeanings.

A second type of character variants is called ‘resemblance variants’.Resemblance variants are different characters in a given language oracross different languages, where the different characters look similar,but may be pronounced in a similar manner or different manners, mayconvey similar meanings or different meanings, or some combinationthereof. Often, resemblance variants have their origin in a character inone language, e.g., Chinese, where the character was adopted by otherlanguages, e.g., Japanese or Korean, and gradually became a part ofthose languages without a change of the written expression of thecharacter.

As a result, the original character and their resemblance variants wereall added in the Unicode tables that contain characters from differentlanguages, each character and its resemblance variants having a distinctcode point and being treated as a unique character by computers althoughthey visually look alike. A user can generally input all the charactervariants by using a UIM or other input methods.

The illustrative embodiments recognize a particular problem withresemblance variants. For example, if a user searches for a character,and if the character has resemblance variants, the user is presentedwith the resemblance variants. For example, if the user is searching forcharacter 302 in FIG. 3, the user may be presented with characters 302,304, and 306, where characters 304, and 306 are resemblance variants ofthe a common character, e.g., of character 302.

The illustrative embodiments recognize that when faced with resemblancevariants, a user may unintentionally, unknowingly, or even maliciouslyselect a different resemblance variant than the variant that wasintended. The illustrative embodiments recognize that selecting orentering a different resemblance variant of a character than an intendedresemblance variant of the character can pose a variety of problems indata management.

For example, a particular variant may not be allowed in informationprocessing in certain languages or certain regions. Entering aprohibited variant can therefore cause errors, costs, and delays inprocessing of the information in which the prohibited variant isincluded.

As another example, a user may be looking for the character to enter asa user ID, password, filename, or other phrases during the informationprocessing. Selecting the incorrect variant can cause login error, loginlockout, security flagging, existing file not being found, new filebeing created with a name that will not be found by others, andgenerally data being created or manipulated in an inconsistent orerroneous fashion.

As another example, suppose the user is engaged in a record manipulationoperation in a database. Using wrong, inconsistent, or differentvariants can result in ghost records being created in the database.Using wrong, inconsistent, or different variants can also increase thedata processing time and resource usage due to the extra effort neededto process the different variants, e.g., by employing different languageprocessing tools for processing the variants.

The illustrative embodiments further recognize that not only doresemblance variants have to potential to cause data processingcomplications, they also have the potential to cause social andcross-cultural issues and insensitivities. Thus, the illustrativeembodiments recognize that unintended, accidental, or malicious misuseof resemblance variants can adversely affect the quality of data whereused, and also have social, geographical, political, and economicconsequences.

The illustrative embodiments used to describe the invention generallyaddress and solve the above-described problems and other problemsrelated to using character variants. The illustrative embodimentsprovide a method, system, and computer program product for configurablecharacter variant unification.

Unification is the process of unifying one or more character variantsback to a common character. A unification database (database, databases)according to an embodiment is a repository of variants that can beunified to a character in a given language. For example, a unificationdatabase for simplified Chinese language includes a list of characters,their corresponding code points in simplified Chinese, and theirrespective distinct variants, resemblance variants, or both. Any numberof unification databases can be created for any number of languageswithout limitation within the scope of the illustrative embodiments.

Furthermore, more than one unification databases may exist for a givenlanguage. A custom unification database that includes entries forcharacters in multiple languages and their variants from a combinationof languages is also contemplated within the scope of the illustrativeembodiments. The unification database can take any suitable form,including but not limited to a relational database, a flat-file, an XMLfile, an index file, a spreadsheet, a table, and the like.

A unification rule (rule, rules) is logic in any suitable form toresolve a variant to an intended character or a different variant usingone or more unification databases. For example, given a charactervariant selected by a user, an embodiment uses a unification rule toselect a suitable changed variant of the character from a unificationdatabase.

Any number of unification rules is permissible without departing thescope of the illustrative embodiments. A set of unification rulesaccording to the illustrative embodiments can include unification rulesfor any combination of different languages, different geographicalregions, different locales, and different contexts of usage.Furthermore, different unification rules may produce different changedvariants depending on various considerations. For example, if the userselects a variant at login time, for use in a user ID, an embodimentuses a different unification rule according to the login context ascompared to the unification rule used when the user selects a variant toembed in a document. The different unification rules may use the same ordifferent one or more unification databases, and produce same ordifferent changed variants under the different contexts.

A unification profile (profile, profiles) comprises one or moreunification levels (level, levels). A unification level defines how andwhich unification databases are to be combined for a particularunification exercise. A unification profile applies to a user, a groupof users, a document, an application, a data storage, a locale, ageographical region, or some combination thereof.

For example, a user can define four example levels—Simplified Chineseonly (level 1); Traditional Chinese only (level 2); Simplified Chineseand Traditional Chinese (level 3); and Simplified Chinese, TraditionalChinese, and Japanese Kanji (level 4). The user can further define, forexample, that different levels apply to different users, groups,applications, or storage, in a manner that further narrow or define theapplication of the profile.

For example, a profile might apply to a group and a level therein mightapply to a specific user in that group. As another example, a profilemight apply to a geographical region and a level therein might apply toa specific group operating in that region. These example ways ofconstructing profiles and levels, and example applications of theprofiles and levels are not intended to be limiting on the illustrativeembodiments. From this disclosure, those of ordinary skill in the artwill be able to conceive many other ways of constructing and usingprofiles and levels, and the same are contemplated within the scope ofthe illustrative embodiments.

In operation, when a user is going to be using character variants, anembodiment selects a profile and a level in the profile according towhich the embodiment will unify the variants to produce an outputvariant for the user-select variant. In one embodiment, the user selectsthe profile and the level. In another embodiment, the profile and levelare selected on behalf of the user, such as by an administrator. Inanother embodiment, a policy determines the profile and level that theembodiment should use.

Once the profile and the level are selected, an embodiment performs thevariant unification on a character variant selected by the user toproduce an output character variant. The embodiment uses the one or moreunification database(s) corresponding to the selected profile and level,according to one or more unification rules governing the circumstancesof the selection of the character variant by the user.

An embodiment allows a user, an administrator, or both to create,modify, or manipulate a unification profile, a unification level withina unification profile, a unification rule, a unification database, orsome combination thereof. For example, one embodiment allows a user,whose variant selections are to be unified, to manipulate a level butnot the profile itself, and allows an administrator to manipulate theprofile. Another example embodiment allows the user create or manipulateentries in a unification database but only allows an administrator tomanipulate unification rules. Another example embodiment allowsdifferent users to manipulate different profiles, levels, rules,databases, or a combination thereof.

A method of an embodiment described herein, when implemented to executeon a data processing system, comprises substantial advancement of thefunctionality of that data processing system. For example, an embodimentenables the data processing system to identify and unify particularvariants that may not be allowed in information processing in certainlanguages or certain regions, prior to such variants entering suchinformation processing. Such identification and unification ability isunavailable in presently operating data processing systems. Thus, asubstantial advancement of such data processing systems by executing amethod of an embodiment comprises the prevention or mitigation of theerrors, costs, and delays in processing of the information caused by theprior art data processing systems allowing the entry and storing ofundesirable character variants.

The illustrative embodiments are described with respect to certainlanguages, characters, character variants, documents, identifiers,contexts, profiles, levels, databases, repositories, policies, logic,rules, data processing systems, environments, components, andapplications only as examples. Any specific manifestations of suchartifacts are not intended to be limiting to the invention. Any suitablemanifestation of these and other similar artifacts can be selectedwithin the scope of the illustrative embodiments.

Furthermore, the illustrative embodiments may be implemented withrespect to any type of data, data source, or access to a data sourceover a data network. Any type of data storage device may provide thedata to an embodiment of the invention, either locally at a dataprocessing system or over a data network, within the scope of theinvention.

The illustrative embodiments are described using specific code, designs,architectures, protocols, layouts, schematics, and tools only asexamples and are not limiting to the illustrative embodiments.Furthermore, the illustrative embodiments are described in someinstances using particular software, tools, and data processingenvironments only as an example for the clarity of the description. Theillustrative embodiments may be used in conjunction with othercomparable or similarly purposed structures, systems, applications, orarchitectures. An illustrative embodiment may be implemented inhardware, software, or a combination thereof.

The examples in this disclosure are used only for the clarity of thedescription and are not limiting to the illustrative embodiments.Additional data, operations, actions, tasks, activities, andmanipulations will be conceivable from this disclosure and the same arecontemplated within the scope of the illustrative embodiments.

Any advantages listed herein are only examples and are not intended tobe limiting to the illustrative embodiments. Additional or differentadvantages may be realized by specific illustrative embodiments.Furthermore, a particular illustrative embodiment may have some, all, ornone of the advantages listed above.

With reference to the figures and in particular with reference to FIGS.1 and 2, these figures are example diagrams of data processingenvironments in which illustrative embodiments may be implemented. FIGS.1 and 2 are only examples and are not intended to assert or imply anylimitation with regard to the environments in which differentembodiments may be implemented. A particular implementation may makemany modifications to the depicted environments based on the followingdescription.

FIG. 1 depicts a block diagram of a network of data processing systemsin which illustrative embodiments may be implemented. Data processingenvironment 100 is a network of computers in which the illustrativeembodiments may be implemented. Data processing environment 100 includesnetwork 102. Network 102 is the medium used to provide communicationslinks between various devices and computers connected together withindata processing environment 100. Network 102 may include connections,such as wire, wireless communication links, or fiber optic cables.Server 104 and server 106 couple to network 102 along with storage unit108. Software applications may execute on any computer in dataprocessing environment 100.

In addition, clients 110, 112, and 114 couple to network 102. A dataprocessing system, such as server 104 or 106, or client 110, 112, or 114may contain data and may have software applications or software toolsexecuting thereon.

Only as an example, and without implying any limitation to sucharchitecture, FIG. 1 depicts certain components that are usable in anexample implementation of an embodiment. For example, servers 104 and106, and clients 110, 112, 114, are depicted as servers and clients onlyas example and not to imply a limitation to a client-serverarchitecture. As another example, an embodiment can be distributedacross several data processing systems and a data network as shown,whereas another embodiment can be implemented on a single dataprocessing system within the scope of the illustrative embodiments.

Input method application 103 is any suitable UIM as described herein.Application 105 implements an embodiment described herein. Unificationrules 107 are a set of one or more unification rules usable in anembodiment. Unification databases 109 are a set of one or moreunification repositories of any suitable types as described herein.Unification profiles 111 are a set of one or more unification profilesusable in an embodiment.

Servers 104 and 106, storage unit 108, and clients 110, 112, and 114 maycouple to network 102 using wired connections, wireless communicationprotocols, or other suitable data connectivity. Clients 110, 112, and114 may be, for example, personal computers or network computers.

In the depicted example, server 104 may provide data, such as bootfiles, operating system images, and applications to clients 110, 112,and 114. Clients 110, 112, and 114 may be clients to server 104 in thisexample. Clients 110, 112, 114, or some combination thereof, may includetheir own data, boot files, operating system images, and applications.Data processing environment 100 may include additional servers, clients,and other devices that are not shown.

In the depicted example, data processing environment 100 may be theInternet. Network 102 may represent a collection of networks andgateways that use the Transmission Control Protocol/Internet Protocol(TCP/IP) and other protocols to communicate with one another. At theheart of the Internet is a backbone of data communication links betweenmajor nodes or host computers, including thousands of commercial,governmental, educational, and other computer systems that route dataand messages. Of course, data processing environment 100 also may beimplemented as a number of different types of networks, such as forexample, an intranet, a local area network (LAN), or a wide area network(WAN). FIG. 1 is intended as an example, and not as an architecturallimitation for the different illustrative embodiments.

Among other uses, data processing environment 100 may be used forimplementing a client-server environment in which the illustrativeembodiments may be implemented. A client-server environment enablessoftware applications and data to be distributed across a network suchthat an application functions by using the interactivity between aclient data processing system and a server data processing system. Dataprocessing environment 100 may also employ a service orientedarchitecture where interoperable software components distributed acrossa network may be packaged together as coherent business applications.

With reference to FIG. 2, this figure depicts a block diagram of a dataprocessing system in which illustrative embodiments may be implemented.Data processing system 200 is an example of a computer, such as servers104 and 106, or clients 110, 112, and 114 in FIG. 1, or another type ofdevice in which computer usable program code or instructionsimplementing the processes may be located for the illustrativeembodiments. Data processing system 200 is also representative of otherdevices in which computer usable program code or instructionsimplementing the processes of the illustrative embodiments may belocated. Data processing system 200 is described as a computer only asan example, without being limited thereto. Implementations in the formof other devices may modify data processing system 200 and eveneliminate certain depicted components there from without departing fromthe general description of the operations and functions of dataprocessing system 200 described herein.

In the depicted example, data processing system 200 employs a hubarchitecture including North Bridge and memory controller hub (NB/MCH)202 and South Bridge and input/output (I/O) controller hub (SB/ICH) 204.Processing unit 206, main memory 208, and graphics processor 210 arecoupled to North Bridge and memory controller hub (NB/MCH) 202.Processing unit 206 may contain one or more processors and may beimplemented using one or more heterogeneous processor systems.Processing unit 206 may be a multi-core processor. Graphics processor210 may be coupled to NB/MCH 202 through an accelerated graphics port(AGP) in certain implementations.

In the depicted example, local area network (LAN) adapter 212 is coupledto South Bridge and I/O controller hub (SB/ICH) 204. Audio adapter 216,keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224,universal serial bus (USB) and other ports 232, and PCI/PCIe devices 234are coupled to South Bridge and I/O controller hub 204 through bus 238.Hard disk drive (HDD) or solid-state drive (SSD) 226 and CD-ROM 230 arecoupled to South Bridge and I/O controller hub 204 through bus 240.PCI/PCIe devices 234 may include, for example, Ethernet adapters, add-incards, and PC cards for notebook computers. PCI uses a card buscontroller, while PCIe does not. ROM 224 may be, for example, a flashbinary input/output system (BIOS). Hard disk drive 226 and CD-ROM 230may use, for example, an integrated drive electronics (IDE), serialadvanced technology attachment (SATA) interface, or variants such asexternal-SATA (eSATA) and micro-SATA (mSATA). A super I/O (SIO) device236 may be coupled to South Bridge and I/O controller hub (SB/ICH) 204through bus 238.

Memories, such as main memory 208, ROM 224, or flash memory (not shown),are some examples of computer usable storage devices. Hard disk drive orsolid state drive 226, CD-ROM 230, and other similarly usable devicesare some examples of computer usable storage devices including acomputer usable storage medium.

An operating system runs on processing unit 206. The operating systemcoordinates and provides control of various components within dataprocessing system 200 in FIG. 2. The operating system may be acommercially available operating system such as AIX® (AIX is a trademarkof International Business Machines Corporation in the United States andother countries), Microsoft® Windows® (Microsoft and Windows aretrademarks of Microsoft Corporation in the United States and othercountries), or Linux® (Linux is a trademark of Linus Torvalds in theUnited States and other countries). An object oriented programmingsystem, such as the Java™ programming system, may run in conjunctionwith the operating system and provides calls to the operating systemfrom Java™ programs or applications executing on data processing system200 (Java and all Java-based trademarks and logos are trademarks orregistered trademarks of Oracle Corporation and/or its affiliates).

Instructions for the operating system, the object-oriented programmingsystem, and applications or programs, such as input method application103, application 105, and unification rules 107 in FIG. 1, are locatedon storage devices, such as hard disk drive 226, and may be loaded intoat least one of one or more memories, such as main memory 208, forexecution by processing unit 206. The processes of the illustrativeembodiments may be performed by processing unit 206 using computerimplemented instructions, which may be located in a memory, such as, forexample, main memory 208, read only memory 224, or in one or moreperipheral devices.

The hardware in FIGS. 1-2 may vary depending on the implementation.Other internal hardware or peripheral devices, such as flash memory,equivalent non-volatile memory, or optical disk drives and the like, maybe used in addition to or in place of the hardware depicted in FIGS.1-2. In addition, the processes of the illustrative embodiments may beapplied to a multiprocessor data processing system.

In some illustrative examples, data processing system 200 may be apersonal digital assistant (PDA), which is generally configured withflash memory to provide non-volatile memory for storing operating systemfiles and/or user-generated data. A bus system may comprise one or morebuses, such as a system bus, an I/O bus, and a PCI bus. Of course, thebus system may be implemented using any type of communications fabric orarchitecture that provides for a transfer of data between differentcomponents or devices attached to the fabric or architecture.

A communications unit may include one or more devices used to transmitand receive data, such as a modem or a network adapter. A memory may be,for example, main memory 208 or a cache, such as the cache found inNorth Bridge and memory controller hub 202. A processing unit mayinclude one or more processors or CPUs.

The depicted examples in FIGS. 1-2 and above-described examples are notmeant to imply architectural limitations. For example, data processingsystem 200 also may be a tablet computer, laptop computer, or telephonedevice in addition to taking the form of a PDA.

With reference to FIG. 3, this figure depicts a table of exampleresemblance variants of an example character that can be configurablyunified in accordance with an illustrative embodiment. Application 105in FIG. 1 can be used to unify variants 302, 304, and 306 into thevariant 302.

Table 300 shows that Han character 302 is an original variant thatcorresponds to Unicode code point U+5317, and has at least two otherresemblance variants 304 and 306, which correspond to code points U+F963and U+2f82b, respectively. For example, if a user uses a PINYIN, acommonly used Chinese phonetic input method editor, and enters “bei”using a keyboard, the user is likely to be presented with all threevariants from which the user selects one variant to use.

Suppose a given usage context or circumstance requires Han charactersbut the user selects variant 304 corresponding to code point U+F963,thereby selecting a Chinese-Japanese-Korean (CJK) compatibilityideograph instead. Such a selection would ordinarily cause an error orother complication in the processing of the data including the selectedvariant. If character variant unification feature is enabled, such as byusing an application implementing an embodiment, e.g., application 105in FIG. 1, the unification feature unifies selected variant 304 toproduce variant 302 as the output.

An entry in a unification database in databases 109 of FIG. 1establishes the correspondence between variants 302, 304, and 306. Aunification rule in rules 107 of FIG. 1 allows the application todetermine that under the circumstances of the selection, variant 304should be changed to variant 302. Only as an example to illustrate theoperation of unification rules, and not to imply any limitation on theillustrative embodiments, another unification rule in rules 107 of FIG.1 may allow the application to determine that under differentcircumstances of the selection, variant 304 should be changed to variant306, or variant 304 should be accepted as the correct variant.

With reference to FIG. 4, this figure depicts a block diagram of aconfiguration for configurable character variant unification inaccordance with an illustrative embodiment. Application 402 can beimplemented using application 105 in FIG. 1.

User 404 uses input device 406 to enter a search for a character. UIM408 presents a set of characters from which the user selects character410. UIM 408 produces code point 412 of selected character 410. Codepoint 412 serves as an input to application 402.

Component 414 allows a user or administrator to define one or moreunification profiles, one or more unification levels within aunification profile, one or more unification rules, one or moreunification databases, or a combination thereof. Component 416 selects aunification profile, e.g., unification profile 418 and a level therein,and one or more unification databases according to profile 418, e.g.,unification database 420.

Component 422 applies one or more unification rules 424 to code point412 according to the selected level in profile 418. Component 412produces output character variant 426, or a Unicode code pointcorresponding thereto.

With reference to FIG. 5, this figure depicts a flowchart of an exampleprocess for configurable character variant unification in accordancewith an illustrative embodiment. Process 500 can be implemented inapplication 402 in FIG. 4.

The application receives a Unicode code point value of a selectedcharacter variant from a UIM (block 502). The application determines aunification profile that is active or applicable for unifying theselected character variant and a unification level in that profile(block 504).

For example, according to one embodiment, the profile and level areselected by a user or administrator and the application uses theselected profile and level in block 504. According to anotherembodiment, the application determines a circumstance of the selectedcharacter variant, e.g., the user's identification, the user'smembership in a group, the document where the selected character variantis going to be used, a context in which the variant has been selected,an application that is to receive the character from the user, othercircumstances, or a combination thereof. Based on the circumstance ofusage, the application selects a suitable profile and a level thereinfor use in block 504.

The application selects one or more unification databases according tothe selected profile and level (block 506). The application applies oneor more unification rules on the selected character variant using theselected unification databases of block 506 (block 508).

From applying a unification rule, the application determines whether theselected character variant complies with the rule (block 510). If theselected character variant is the correct variant (Yes” path of block510), the application sends the selected character variant to the targetapplication that was to receive the character (block 512). Theapplication ends process 500 thereafter.

If the selected character variant is the correct variant (Yes” path ofblock 510), the application replaces the Unicode code point received inblock 502 with a Unicode code point of another character variantidentified in a unification database according to a unification rule(block 514). The application sends the replaced character variant to thetarget application in block 514 and ends process 500 thereafter.

With reference to FIG. 6, this figure depicts a flowchart of an exampleprocess for configuring character variant unification in accordance withan illustrative embodiment. Process 600 can be implemented inapplication 402 in FIG. 4.

The application causes to be created, or creates, a unification profilethat is applicable to a circumstance in which a character variant mightbe selected (block 602). The application causes to be defined, ordefines, a unification level in the unification profile of block 602(block 604).

For a level in the profile, the application associates one or moreunification databases with the profile level (block 606). Theapplication enables the profile and/or level to be used with a set ofunification rules (block 608). For example, an administrator may desireto create some profiles and/or levels in reserve without enabling themfor use. Similarly, an administrator may enable or disable a profile ora level within the profile according to changing needs for charactervariant unification.

The application repeats blocks 604-608 for as many levels as may bedesired in a profile. The application repeats blocks 602-608 for as manyprofiles with as many levels as may be desired in a givenimplementation. The application ends process 600 thereafter.

Thus, a computer implemented method, system or apparatus, and computerprogram product are provided in the illustrative embodiments forconfigurable character variant unification.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

1-11. (canceled)
 12. A computer usable program product comprising acomputer readable storage device including computer usable code forconfigurable character variant unification, the computer usable codecomprising: computer usable code for determining that a unificationprofile is applicable to a circumstance in which a character variant hasbeen selected, wherein the character variant is a variation of acharacter in a set of variations of the character such that eachvariation of the character in the set is represented by a unique Unicodecode point; computer usable code for identifying a unificationrepository according to the profile; computer usable code fordetermining whether the character variant satisfies a unification rule;computer usable code for selecting, responsive to the character variantnot satisfying the unification rule, a different variation of thecharacter from the unification repository, the different variationforming a replacement character variant; and computer usable code forusing the replacement character variant in place of the charactervariant.
 13. The computer usable program product of claim 12, furthercomprising: computer usable code for receiving a Unicode code point ofthe character variant; and computer usable code for outputting adifferent Unicode code point corresponding to the replacement charactervariant.
 14. The computer usable program product of claim 12, furthercomprising: computer usable code for determining that the circumstancerelates to a level in the unification profile, wherein the unificationdatabase is selected according to the level in the unification profile.15. The computer usable program product of claim 12, wherein a userselects the character variant, and the circumstance comprising: amembership of the user in a group.
 16. The computer usable programproduct of claim 12, the circumstance comprising: a geographical regionrelated to a user who selects the character variant.
 17. The computerusable program product of claim 12, the circumstance comprising: alanguage related to a user who selects the character variant.
 18. Thecomputer usable program product of claim 12, wherein the computer usablecode is stored in a computer readable storage medium in a dataprocessing system, and wherein the computer usable code is transferredover a network from a remote data processing system.
 19. The computerusable program product of claim 12, wherein the computer usable code isstored in a computer readable storage medium in a server data processingsystem, and wherein the computer usable code is downloaded over anetwork to a remote data processing system for use in a computerreadable storage medium associated with the remote data processingsystem.
 20. A data processing system for configurable character variantunification, the data processing system comprising: a storage deviceincluding a storage medium, wherein the storage device stores computerusable program code; and a processor, wherein the processor executes thecomputer usable program code, and wherein the computer usable programcode comprises: computer usable code for determining that a unificationprofile is applicable to a circumstance in which a character variant hasbeen selected, wherein the character variant is a variation of acharacter in a set of variations of the character such that eachvariation of the character in the set is represented by a unique Unicodecode point; computer usable code for identifying a unificationrepository according to the profile; computer usable code fordetermining whether the character variant satisfies a unification rule;computer usable code for selecting, responsive to the character variantnot satisfying the unification rule, a different variation of thecharacter from the unification repository, the different variationforming a replacement character variant; and computer usable code forusing the replacement character variant in place of the charactervariant.